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1 Introduction 


Medicine is in a period of transition. An ever-increasing amount of information is available on 
patients ranging from genetic and epigenetic profiles enabled by next-generation sequencing to 
moment-to-moment data collected by physical activity monitors. With this wealth of information 
comes the opportunity to provide more targeted healthcare including, for example, prediction 
of pre-clinical atherosclerosis (McGeachie et ah, 2009| ), individualized cancer screening (Saini, 
van Hees, and Vijan, 2014), sub-typing of scleroderma (Schulam, Wigley, and Saria, 2015), and 
personalized cancer treatment ( [Hayden 2009). In order to fully realize the promise of patient- 
focused medicine, principled statistical methods are needed that integrate data from a variety 
of sources in order to provide physicians and patients with relevant syntheses to inform their 
decision-making. These methods must also accommodate limitations common to data generated 
in an observational setting including measurement error and informative missing data patterns. 

An excellent example of this challenge is low-risk prostate cancer diagnosis. Tumor lethality is 
an aspect of an individual’s health state that is not directly observable but is manifest in multiple 
types of measurements including biomarkers, histology of biopsied tissue, genetic markers, and 
family history of the disease. Individualized predictions of the latent disease state are critical 
to guide treatment decisions. If the tumor is potentially lethal, immediate treatment (including 
surgery or radiation) can be life-saving. Yet, some tumors are indolent and not life-threatening. 
In this case, treatment is not recommended due to the risk of lasting side effects including urinary 


incontinence and erectile dysfunction (Chou et ah, 2011). 

Active surveillance (AS) offers an alternative to early treatment for individuals with lower risk 


disease (Dall’Era et ah, 2012). Though AS regimes vary, the approach generally entails regular 
biopsies (e.g., annually) with intervention recommended upon detection of higher risk histological 


features, as determined by the Gleason grading system (Gleason, 1992). Biopsies with a Gleason 
score of 6 (the minimum for prostate cancer diagnosis) indicate low risk disease while a subsequent 


Gleason score of 7 or above is considered “grade reclassification” (Tosoian et ah, 2015); treatment 
is recommended once grade reclassihcation is observed. Prostate-specihc antigen (PSA), a blood 
serum biomarker of inflammation in the prostate, is also routinely measured and may be used as 
the basis for a biopsy recommendation. 
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The success of AS programs depends on clinicians’ ability to identify tumors with metastatic 
potential with sufficient time for curative intervention to be effective. Yet, biopsies used to 
characterize tumors typically sample less than one percent of the prostate tissue and so have 


imperfect sensitivity and specificity (Epstein et ah, 2012). Existing decision support tools that 


predict biopsy outcomes for AS patients (including, most recently, Ankerst et ah (2015)) provide 
patients and physicians with valuable information to guide decisions about biopsy timing and 
frequency but are insufficient to directly address patients’ primary concerns about their tumors’ 
lethality. Patients and clinicians need predictions of the pathological make-up of the entire 
prostate to guide their decision-making. 

With this application in mind, we have developed a Bayesian hierarchical model that enables 
prediction of an individual’s underlying disease state via joint modeling of repeated PSA mea¬ 
surements and biopsies. Specifically, we predict a binary cancer state- indolent or aggressive- 
with the latter defined as a “true” Gleason score of 7 or higher. Predictions are informed by a sub¬ 
set of patients for whom the true state is observed- patients who, either before or after biopsy 
grade reclassification, chose to undergo prostatectomy and have post-surgery, entire-prostate 
Gleason score determinations. In this sense, cancer state operates as a partially-latent class in 


the proposed model ( Wu et al.| 2015). 

An individual’s cancer state is assumed to be manifest in both the level and trajectory of 
PSA measurements as well as in the outcomes from repeated biopsies. These relationships are 


illustrated by the directed acyclic graph (DAG) in Figure 1(a) In the model we are proposing, 
PSA measurements follow a multilevel model with mean intercept and age effects varying across 
latent classes. Then, repeated annual biopsies constitute a time-to-event outcome since patients 
exit AS after grade reclassification on biopsy. So, time until reclassification on biopsy is modeled 
using pooled logistic regression under the assumption that biopsy results are independent condi¬ 


tional on cancer state and covariates (Guppies et ah, 1988). Pooled logistic regression provides 
survival estimates equivalent to those of a time-varying Gox model for discrete event times and 


conditionally independent intervals (D’Agostino et ah, 1990). As indicated in Figure 1(a), PSA 
and biopsy results are also assumed to be conditionally independent given latent class. 


The model depicted in Figure 1(a) is related to previous work by Lin et al. (2002), who 
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(a) Time-varying scenario (b) Informative observation process scenarios 


Figure 1: DAGs describing the relationships between latent class (circled) and clinical outcomes 
(squared). 


proposed a joint latent class model (JLCM) to analyze longitudinal PSA and time-to-diagnosis of 


prostate cancer, extending earlier joint models by Schluchter (1992), DeGruttola and Tu (1994), 


and Henderson, Higgle, and Dobson (2000). Inoue et ah (2008) used a Bayesian approach 
to jointly model PSA and time-to-diagnosis at various stages of disease in order to estimate 


the underlying natural history process for prostate cancer initiation and progression. Proust- 


Lima and Taylor (2009) developed a dynamic extension of the JLGM to predict prostate cancer 


recurrence after radiation therapy. 


A credible statistical solution to the active surveillance problem requires three extensions of 
existing latent variable models for multivariate outcomes (such as the JLGM). First, the model 
must accommodate measurement error inherent in monitoring disease state. In our approach, we 
focus prediction on a partially-observed true Gleason score, instead of relying on biopsy Gleason 
scores for accurate characterization of the latent health state, and model a stochastic, rather 
than deterministic, relationship between the two. 


Second, the model must allow disease monitoring to reflect patterns of clinical practice, includ¬ 
ing discrete, possibly informative, observation times. Specihcally, prostate biopsies are scheduled 
to occur annually, but patients may opt to forgo the procedure. Our method replaces the JLGM’s 
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usual survival model for right-censored outcomes in favor of a pooled logistic regression model 
for biopsy grade reclassification where the possibility of reclassihcation in any year is conditional 
on a biopsy being performed. Furthermore, it is possible that the choice to receive a biopsy de¬ 
pends on the true cancer state or, more generally, that unobserved confounding exists, as shown 


by the dotted arrow from true cancer state to the “Biopsy Performed” node in Figure 1(b) If 
so, biopsy results are missing not at random (MNAR), and predictions of the true state that 


ignore the MNAR mechanism will be biased (Little and Rubin, 2014). In response, our approach 
also includes a regression model for the probability of receiving a biopsy in each interval; the 
occurrence of a biopsy is allowed to depend on the latent health state, as well as previous biopsy 
and PSA observations. 


Third, the active surveillance model must allow surgical removal of the prostate and subse¬ 
quent observation of the underlying cancer state to be informative of that latent state. Consider 
the dotted arrow from true cancer state to the “Surgical Removal” node in Figure l(b)[ If, 
after conditioning on clinical observations, an individual’s true cancer state is associated with 
his choice to undergo surgery, whether through direct causation or unmeasured confounding, 
then informative missingness is present and failure to accommodate this in the model will result 
in biased predictions of the cancer state. While the association between the true cancer state 
and a binary indicator of its observation is not identihable, we propose to model the time until 
surgery (and true state observation) conditional on the latent state. Evidence of this relationship 
among the subset of patients with surgery and mild assumptions on the structure of the hazard 
function (such as an additive or smooth effect) provide identihability. This approach shares a 
similar intuition with missing data models for repeated attempt designs in which the estimated 
association between the number of attempts needed to elicit a response and its value is used to 


account for outcomes suspected to be MNAR (Jackson et al., 2012). In this application, patients 
have the opportunity to elect surgery throughout their participation. For simplicity, we also 
model the time until surgery with a pooled logistic regression model. 


This paper is organized as follows. In Section 2, a hierarchical model for latent class prediction 
is described and estimation procedures are outlined. In Section 3, we specify our model to predict 
latent cancer states for patients in the Johns Hopkins Active Surveillance cohort and outline a 
simulation study based on this application. Results are presented in Section 4. We close with a 


5 












discussion. 


2 Hierarchical Latent Class Model 

We propose a Bayesian hierarchical model of the underlying cancer state, measurement process, 
and clinical outcomes of patients enrolled in active surveillance (AS). Predictions are made by 
incorporating information from repeated PSA and biopsy measurements for all patients and true 
cancer state observations in a potentially non-random subset of the cohort. Predictions are also 
informed by the presence of some observations, which we refer to as an informative observation 
process (lOP). In this section, we introduce notation and conditional distributions for the ob¬ 
served data given the latent variables and parameters, then give the likelihood function. The 
model is completed by specifying appropriate priors and dehning the joint posterior distribution. 
Overall model structure is summarized in Figure 

2.1 Latent cancer state r/j for patient z, z = 1,..., n 

Dehne individual i’s true cancer state, rji, as either indolent, rji = 0, or aggressive, rji = 1, 
i = 1,... ,n. We use the Gleason score that would be assigned if his entire prostate were to be 
surgically removed and analyzed to dehne r/j = 0 if Gleason = 6 and rji = 1 if Gleason > 7. Note 
that this dehnition assumes that cancer state is constant during the time under consideration. 
This assumption is discussed in more detail in Section 5. 

True cancer state is then modeled as a Bernoulli random variable, rji ~ Bern{pi). We assume 
a shared underlying probability of aggressive cancer, pi = p, for simplicity in initial presentation. 
We observe this true cancer state on a possibly non-random subset of patients who choose surgical 
removal of the prostate and, hence, rji is a partially-latent variable. 

2.2 Longitudinal data Yi^ given latent class r/*, m = 1,..., Mj 

Next, we consider PSA, which is inhuenced by the true cancer state pi as well as covariates 
including age and prostate volume. Unlike biopsies, PSA measurements are a routine part of each 
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clinic visit so the times of observation are assumed to be independent of rji. We use a multilevel 


model to estimate the linear trend (on a log scale) of an individual’s PSA as he ages (Gelman and 


Hill 


2006). Patient-level coefficients, bj, vary about an ? 7 j-specffic mean intercept and slope 


Specihcation follows that of a hierarchically-centered multilevel model to speed convergence of 
the posterior sampling algorithm (Gelfand, Sahu, and Garlin, 1995). Specihcally, given bj, the 
log-transformed PSA for patient Ps mth visit, Yim, is assumed equal to + Zjmbi -|- 

where and are covariate vectors for individual i at visit m, /3 is a parameter vector of 
population-level coefficients, and residual Cjm is assumed to follow a Gaussian distribution with 


mean zero and variance In comparison to the commonly used mixed effects model of Laird 


and Ware (1982), covariates in Zj^, are not a subset of covariates in Xim] covariates corresponding 


to patient-level effects bj are only included in Zj^, and the bj are not centered at zero. In our 
application, Zim includes an intercept and age so that PSA intercepts and slopes vary across 
individuals. includes prostate volume, and /3 is the population-level association between 

volume and log-PSA. 


Modeling of patient-level coefficients follows the recommendation of Gelman and Hill (2006) 


who advocate the use of a scaled inverse Wishart prior on the covariance matrix. The inverse 


Wishart prior, which is commonly used for Bayesian estimation of multilevel models (Gelfand 


et ah, 1995), imposes dependence between variance and correlation components of the covari¬ 


ance matrix. To reduce prior dependence and allow for a flat prior on the correlation between 


individual-level intercepts and slopes, O’Malley and Zaslavsky (2008) introduce a scale param¬ 


eter, for the patient-level random effects: bj = diag{hi^'^). Unsealed random effects, bj, are 
assumed to follow a latent-class specihe multivariate Gaussian distribution with mean vector fj,^. 


and covariance matrix S 


Vi- 


2.3 Biopsy Occurrence Bij and Result Rij for patient i in time interval 

j = 1,..., R 


We then consider information about the true cancer state contained in the occurrence and results 
of prostate biopsies. Biopsy data are categorized into discrete time intervals with {Bij, Rij) 
denoting binary outcomes for individual i in time interval j. Bij indicates whether a biopsy 
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was performed {Bij = 1) or not {Bij = 0) and, when it was performed, Rij indicates if grade 

reclassification occurred {Rij = 1) or not {Rij = 0). Bij and Rij are defined for j = 1 ,..., R, 

where Ji is the time interval of reclassihcation or censoring for patient i. For each time interval, 
we use logistic regression to model the occurrence of a biopsy and, when a biopsy was performed, 
its result; both outcomes are conditional on true cancer state: 

logit{ P{Bij = l|?7i, Ujj, u)] = \Jijh>i + 7 ]ih >2 + (1) 

logit{ P{Rij = l\r]i, yij,Bij = 1,7)} = Vij7i + hi72 + ^ijViPs (2) 

where Ujj and are covariate vectors including time-varying predictors and ly = {ui, 1 ^ 2 , 1 ^ 3 ) 
and 7 = (71,72,73) are parameter vectors to be estimated that include the main effects of 
covariates Uij or Vjj, rji, and the possible interactions Uijrji and 'Vijrji, respectively. Since 
reclassihcation occurs at most once. Equation ([^ corresponds to a modihed pooled logistic 
regression model for time-to-reclassihcation in which only intervals with biopsies contribute. 

This model specihcation represents three important aspects of data generated in active 
surveillance: whether a biopsy is performed may be informative of true cancer state, time- 
to-reclassihcation depends on a patient’s decision to receive a biopsy, and biopsy outcomes are 
prone to measurement error. In this application, Uij and Vij may include age, time since diagno¬ 
sis, and calendar date. Previous PSA and biopsy results may also inhuence the decision to get a 
biopsy, but they do not inhuence biopsy hndings. 

2.4 Surgical Removal of Prostate Sij and its Cancer Lethality r]i 

Lastly, to allow for the possibility that surgical removal of the prostate (and subsequent obser¬ 
vation of the true cancer state) is informative, we dehne Sij to be a binary indicator of surgery 
{Sij = 1) or not {Sij = 0) for individual i during time interval j for j = 1 ,..., Jsi, where J5. 
is the time of surgery or other censoring for patient i and J5- > Ji for all i. The probability 
of surgery in each time interval is modeled with logistic regression and conditional on the true 
cancer state: logit{ P{Sij = l|pj, W^, lj) } = WijUJi + riiUJ 2 + WjjPjLJa where Wjj is a vector of 
time-varying predictors and uj = (lji, li; 2, <^3) is a parameter vector to be estimated. Age, time 
since diagnosis, calendar date, and previous PSA and biopsy results may all be considered as 
possible predictors of surgery. 


2.5 Posterior Distribution Estimation 


Having specified models for each information source, we define the likelihood of the latent states, 
patient-level coefficients, and population-level parameters given the observed data as the product 
of the contribution of each component described above: 


//(parameters, patient-level latent class and coefficients | data) 

n 

i=l 

■h 

P{Bij = 
i=i 

■^Si 

J] = l|r7„ W,,-, Lof-P{S,, = 0|r7„ W,,-, (3) 

i=i 

where / and g are multivariate normal densities for the vector of log-transformed PSAs Yj and 
unsealed patient-level effects bj, respectively, each with mean and covariance as defined in Section 
2.2. Xj denotes the matrix of covariate vectors [Xji,..., X^mJ; Zj is similarly defined. 

We use a Bayesian approach for model estimation. The prior distributions for the latent states 
and patient-level coefficients have been described (Sections 2.1 and 2.2, respectively). Standard 
prior distributions are used for model parameters, including a beta prior on the probability of 
having aggressive cancer (p) and minimally informative Gaussian priors on logistic regression 
model coefficients, as shown in the model summary given in Figure The joint posterior 
distribution of the parameters, latent states, and patient-level effects is proportional to the 
product of the likelihood and joint prior density of model parameters and is given explicitly in 
the online supplement. 


For those patients without rji observed, a data augmentation approach is used to sample the 
true unobserved cancer state from its full conditional posterior at each iteration of the MCMC 


sampling algorithm (Tanner and Wong, 1987). Averaging the resulting posterior sample produces 
the posterior probability that a patient has a true Gleason 7 or higher prostate cancer, P(pi = 1). 
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Cancer State 
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Biopsy Results 

Surgery 

Data 

Outcome 

Vi 

m= 

B,j, j = 1,... ,Ji 

Ii,j, j = !,■■■, Ji 

II 

Covariates 


Xi,„; Intercept, Age 

Zi,n: Prostate volume 

771 = 1, Mi 

Uy, j = 

Age, Time in AS, Date, 

# Previous Biopsies 

Age, Time in AS, Date 

Wij, j = 1,..., Js. 

Age, Time in AS, Date, 

4^ Previous Biopsies and Results 

Model 

Bernoulli 

random variable 

rii ~ Bern{p) 

Stratified multilevel regression 
y\r,i = k 

bj = diag{hi^'^) 

A(X,,^/3+ 

Logistic regression 

Bij\Tli.\Jij ~ 

Bern{P{Bij = l|7?i, Uij, i/)) 

Logistic regression 

PijlVi^^ij 

Bern[P(Ttij = l|i;.,Vy, 7 )) 

Logistic regression 

Sij 1 7)i. AV ij ~ 

Bern[P{Sij = l|77i. Wjj, oj)) 

Priors 

p ~ Beta{l, 1) 

(it ~ MVN{0. 10^ X Idji), /t = 0.1 

S ~ InvWish{li:i2, Dz + 1) 

I3~ MVN(0, 10^ xlD;r) 

~ U(0, 10) 

li ~ MVN(0,10'^ X IdJ 

MVN(0, 10^ X 1b^) 

u~MVN(0, 10^ xId„.) 


Figure 2: Model summary with priors used for application to Johns Hopkins Active Surveillance 
data. Dx is the length of vector X and \dx is the identity matrix with dimension Dx x Dx- 
Dz, Djj, Dy, and Dw and the associated identity matrices are similarly dehned for covariate 
vectors Z, U, V, and W. 


3 Johns Hopkins Active Surveillance Cohort 

3.1 The Data 

From January 1995 to June 2014, the Johns Hopkins Active Surveillance (JHAS) cohort enrolled 
1,298 prostate cancer patients (Tosoian et al.| 2015). This study prospectively follows patients 


with very-low-risk or low-risk prostate cancer diagnoses (according to criteria outlined in Epstein 


et al. (1994)) who elect to delay curative intervention in favor of active surveillance (AS). Results 


of all prior PSA tests and diagnostic biopsies are collected at enrollment. As part of the surveil¬ 
lance regimen, PSA tests are performed every six months and biopsies are performed annually, 
though biopsy intervals may vary based upon patient preferences and clinician recommendations. 
Treatment is recommended upon biopsy grade reclassihcation, that is, when the Gleason score 
assigned on a biopsy hrst exceeds six. Some patients also choose to undergo treatment prior to 
reclassihcation. For patients who elect surgical removal of the prostate, the true Gleason score 
assigned to the entire prostate after pathologic assessment is recorded when available. 


A total of 874 patients who met study criteria and had at least two PSA measurements 
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and at least one post-diagnosis biopsy as of October 1, 2014 were inclnded in the analysis. 
Patient ontcomes are given in Figure Grade reclassification was observed in 160 patients 
(18% of the analysis cohort). Notably, over a quarter of patients with grade reclassihcation who 
underwent prostatectomy were downgraded after surgery (17/65) while nearly a third of patients 
who underwent prostatectomy in the absence of grade reclassihcation were upgraded (30/96). 
Further details on the analysis dataset are given in the online supplement. 


3.2 Model Specification 


We applied models with and without the biopsy and surgery informative observation process 
(lOP) components to data from the JHAS cohort. 

PSA observations were modeled with a hierarchically-centered multilevel model, as described 
in Section 2.2. Patient-level coefficients for intercept and age were estimated for each patient. 
A shared covariance matrix was assumed for the unsealed patient-level effects, that is, Sq = 
Yai{hi\r]i = 0) = Yai{hi\r]i = 1) = Si, in order to reduce model complexity. (The plausibility 
of this assumption was checked by examining estimated covariance matrices in the subset of 
patients with known cancer state.) The PSA model also included a population-level coefficient 
for prostate volume. 


Biopsy, reclassihcation, and surgery observations were categorized into annual intervals, and 
exploratory data analysis was performed to identify predictors of each. Covariates were selected 
that lowered the Akaike Information Criterion (AIC) of a multivariable logistic regression model 
for each outcome (Akaike, 1998) and are listed in Figure Natural splines with up to four 
degrees of freedom and knots at percentiles of the predictor variable were used when doing so 
lowered the AIC. Additional details are given in the online supplement. 


Model parameters and their minimally informative priors are presented in the model summary 
given in Figure]^ Posterior sampling was performed with JAGS (Plummer, 2011 ) via the R package 
R2JAGS ( |Su and Yajima 2015). Parallel chains were run to conhrm model estimates converged 
to similar values. Cumulative quantile and trace plots were also used to monitor convergence. 
Analysis code, posterior sampler settings, and diagnostic plots are included in the supplementary 
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Figure 3: CONSORT diagram for Johns Hopkins Active Surveillance prospective cohort patients 
included in this analysis. Post-surgery full prostate Gleason score (GS) observations are also 
given (circled). (Six patients who underwent prostatectomy did not have true GS observations 
available.) 
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material. 


3.3 Model Assessment 


Predictive accuracy was assessed among patients with post-surgery true Gleason score obser¬ 
vations. Out-of-sample posterior predictions of r] were obtained for each patient by removing 
his true state observation from the analysis dataset and re-running the posterior sampler with 
an additional data augmentation step for the patient of interest. Out-of-sample predictions of 
rji were then compared to known values with receiver operating characteristic (ROC) curves 


(Hanley and McNeil 


and calibration plots (Steyerberg et ah, 2010). For the former, the 


area under the curve (AUC) and associated 95% bootstrapped intervals were calculated. For 
the latter, a plot comparing posterior predictions to observed rates of class membership was 
constructed by performing logistic regression of the observed true state on a natural spline rep¬ 
resentation of out-of-sample posterior predictions (degrees of freedom = 2). The mean squared 
error (MSE) between observed and predicted cancer state was also calculated. For comparison, 
posterior predictions were obtained from a logistic regression model £t with data from patients 
with post-surgery observations of r/; covariates included age, time since diagnosis, and PSA and 
biopsy results. We also compared specihcity of model predictions to the specihcity of using hnal 
biopsy results to predict the true cancer state by hxing sensitivity at the observed true positive 
rate of biopsy Gleason score (dichotomized < 7 or > 7). 


Galibration plots were also drawn to assess model fit for outcomes observed on all patients: 
the occurrence of a biopsy, grade reclassihcation on biopsy, and the occurrence of surgery. Gode 
for reproducing all plots is available in the supplementary material. 


3.4 Simulations 

We performed a simulation study to examine model performance in this application. 200 sim¬ 
ulated datasets were generated using posterior estimates of model parameters obtained from 
the biopsy and surgery lOP analysis of JHAS data. For each dataset, the proposed model was 
estimated under four settings: unadjusted (no lOP components), biopsy lOP only, surgery lOP 
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only, and both biopsy and surgery lOP components. Posterior predictions of the latent state 
were obtained for all simulated patients and compared to known (data-generating) values. For 
patients without surgery, posterior samples of rj were generated with a data augmentation step 
as a matter of course in model estimation. For patients with surgery, the posterior probability 
of 7] = 1 was estimated via an importance sampling algorithm performed on the joint posterior 


(Bishop et ah, 2006), which is less computationally intensive than the out-of-sample methods 


used in Section 3.3. (See technical report of Fisher et ah (2015) for further details.) Posterior 
predictions of r] were also compared to htted probabilities from a logistic regression model. Code 
for generating data, estimating the joint posterior, and obtaining predictions is included in the 
supplementary material. 


4 Results 


The estimated marginal probability of harboring a prostate cancer with Gleason score above 
7 was 0.23 (95% Cl: 0.16, 0.33) for the proposed model with biopsy and surgery lOP compo¬ 
nents, 0.20 (0.14, 0.28) with surgery lOP only, 0.31 (0.24, 0.39) with biopsy lOP only and 0.30 
(0.23, 0.38) with no lOP components. Patients with rj = 1 were less likely to receive biopsies- 
leading to underestimation of p in models without the biopsy TOP component-and more likely 
to elect surgery, such that p was overestimated when not accounting for informative observation. 
Parameter estimates and credible intervals from all models are given in the online supplement 
(Appendix Tables A3-A7). 


A histogram of predictions of rj from the model with biopsy and surgery lOP components is 


given in Figure 4(a) Patients with posterior predictions above 60% are primarily those who both 
experienced grade reclassihcation (solid bars) and elected prostatectomy (red and green). 95% 
of AS patients who neither reclassihed (diagonal shading) nor underwent surgery (black) have 
posterior predictions that are lower than 50%; a majority have predictions below 20%. Figure 
4(b) [shows a scatterplot comparing posterior probabilities of aggressive cancer, P{r] = 1), between 


models with and without lOP components. The models produce similar posterior predictions 
for most patients, particularly those patients for which the non-IOP model assigns very low 
risk. Inclusion of biopsy and surgery lOP components decreases posterior predictions of rj most 
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Figure 4: Posterior predictions of true prostate cancer state. On both plots, coloring indicates 
whether r] was observed and, if so, its value. In the histogram (a), diagonal shading represents 
patients whose hnal biopsy was assigned a Gleason score of 6 while solid bars represent patients 
whose hnal biopsy was assigned a Gleason score of 7 or higher (i.e., grade reclassihcation). In the 
scatterplot (b), circle size indicates the frequency with which a patient received prostate biopsies; 
larger circles represent more frequent biopsies while smaller circles represent less frequent biopsies. 


markedly for patients with frequent biopsies and no surgery (larger black circles below the x=y 
axis) and tends to increase posterior predictions for patients who elect surgery or have infrequent 
biopsies (colored or smaller black circles, respectively, above the x=y axis). These trends are 
further illustrated by density plots of predictions of r] stratihed by reclassihcation and surgery 
given in the online supplement. 


Posterior predictions of rj from the proposed model with biopsy and surgery lOP components 
were more accurate than those from the proposed model with a single or no lOP components 
or from logistic regression (Figures 5(a) and |6(a)[ ). The out-of-sample AUG among patients 
with observed true cancer state is highest for the biopsy and surgery lOP model (0.75, 95% 
bootstrapped interval: 0.67, 0.83), and the MSE from this model was also the lowest (0.201, 95% 
hit: 0.17, 0.24; Table 9 in online supplement). While this improvement is slight, it is widely 


recognized that drastic increases in classihcation accuracy are rare to achieve (Pepe et al., 2004). 
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The false positive rate (FRP) of predictions from the biopsy and surgery lOP model (0.14, 95% 
Int: 0.07, 0.23) is also lower than that of the binary classifier based on final biopsy results 
(FPR=0.20, 95% Int: 0.12, 0.29) at a hxed true positive rate of 0.62 (the sensitivity of hnal 
biopsies in patients with eventual surgery). The improvement in specificity offered by the biopsy 
and surgery lOP model corresponds to avoiding, on average, 30% of unnecessary diagnoses of 
more aggressive cancer in comparison to a diagnosis based solely on a patient’s most recent 
biopsy (95% bootstrapped interval of FPR(i?) - FPR(p): -7.5%, 13%). These comparisons are 
limited because accuracy of posterior predictions can only be assessed among patients with r] 
observed. Yet, we expect the predictive accuracy gained by incorporating lOP components to be 
seen more definitively in patients without true state observations if biopsy and surgery results 
are indeed MNAR. This is explored further with simulations. 


Posterior predictions of r] from the lOP model also appear to accurately estimate a patient’s 
risk of having more aggressive cancer. The calibration plot in Figure [6(1^ shows that, for patients 
with known values of p, the average posterior predicted probability of = 1 is close to the average 
observed value of rj, indicating that the model reasonably reproduces the mean of observations. 
The risks of clinical outcomes (biopsy results) and choices (occurrence of biopsy and surgery) 
for all patients appear to be accurately estimated by the lOP model as well, as demonstrated by 
calibration plots in the online supplement (Appendix Figure A3). 


Estimation of the time-to-surgery model depends on sufficient evidence among patients with 
surgery of a relationship between r] and surgery time. To assess the strength of this evidence, 
we re-ran the biopsy and surgery lOP model with more informative priors on the coefficients 
capturing this relationship and found posterior estimates to be robust. Details are given in the 
online supplement (Appendix Section A3.1.6 and Figure A18). 


Through simulation studies, we found that our sampling procedure produced unbiased es¬ 
timates with nominal coverage (Tables 10-14 in the online supplement) and that the proposed 
model with biopsy and surgery lOP components outperforms other model variations and logistic 
regression when it correctly reflects the data-generating mechanism. The AUC for each model 


among patients with and without post-surgery true state observations are given in Figure 5(b) 


Differences in predictive accuracy across models are similar in magnitude to those observed in 
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JHAS: True State Observed Post-Surgery 

Biopsy, Surgery lOP 
Biopsy lOP only 
Surgery lOP only 
Unadjusted 
Logistic 


0.5 0.6 0.7 0.8 0.9 1.0 

AUC (95% Interval) 

(a) 

Simulation: True State Observed Post-Surgery 

Biopsy, Surgery lOP 
Biopsy lOP only 
Surgery lOP only 
Unadjusted 
Logistic 

0.5 0.6 0.7 0.8 0.9 1.0 

AUC (95% Interval) 

Simulation: True State Unobserved 

Biopsy, Surgery lOP 
Biopsy lOP only 
Surgery lOP only 
Unadjusted 
Logistic 


Figure 5: Estimated AUC for predictions of rj from the JHAS cohort (a) and simulation studies 
(b). Intervals in (a) are quantile-based 95% intervals from 10,000 bootstrap samples of patients 
with post-surgery observations of rj. Intervals in (b) are quantile-based 95% intervals from the 
estimated AUC in 200 simulation studies. 



(b) 
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(a) ROC curves: B, S lOP & non-IOP predictions (b) Calibration plot: B, S lOP predictions 

Figure 6: Predictive accuracy of out-of-sample predictions of rj among patients with true state 
observed in the JHAS cohort. In (a), the specihcity of predictions from each model is highlighted 
at the sensitivity of a binary classiher dehned by hnal biopsy result (*). In (b), the dark line 
shows the empirical rate of observing a true Gleason score of 7 or above (y-axis) given an out- 
of-sample posterior probability of true state (x-axis) under the model with biopsy and surgery 
lOP components; shading gives the 95% point-wise conhdence interval. Perfect agreement lies 
on the x=y axis (dotted line). Hashmarks at y=0 and y=l correspond to observed cancer states 
{rj = 0,1; respectively) for patients with post-surgery true state observations. Hashmarks are 
located along the x-axis at each patient’s out-of-sample posterior probability of the true state. 
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the application (Figure 5(a)). As expected, we also see that the logistic regression model, which 
is only estimated with data from patients with true state observations, has the poorest pre¬ 
dictive accuracy among patients without surgery. Predictions from models incorporating the 
surgery lOP component show appropriate calibration while those without overestimate the risk 
of aggressive prostate cancer (Figure 20, online supplement). 


5 Discussion 


In this paper, we have presented a hierarchical Bayesian model for predicting latent cancer 
state among low risk prostate cancer patients. Multiple models have been developed to predict 


biopsy results in this population (Ankerst et ah, 2015 Truong et ah, 2013). However, our model 


predicts the outcome of chief interest-the true underlying state of an individual’s prostate cancer. 
Focusing on the actual health state, even when latent, is equivalent to subsetting patients into 


subgroups for which optimal treatments differ. Subsetting is the goal of precision medicine (Saria 


and Goldenberg, 2015) 


The proposed model integrates four sources of information about whether a tumor is aggres¬ 
sive or indolent: repeated measures of the biomarker PSA; repeated results from tissue biopsies; 
repeated decisions to have a biopsy; and the time to surgical removal of the prostate. In the 
subset of patients that have their prostates removed, the true tumor pathology state is observed. 
This data-integrating method is an example of semi-supervised learning because patients both 
with and without true state observations are included in model estimation (Chapelle, Scholkopf, 
and Zien, 2006). We adjust for possible informative missingness by modeling the time until 
surgery depending on the true state. While it is ideal to assess model sensitivity to parametric 


assumptions embedded in selection models for missing not at random data mechanisms (Daniels 


and Hogan, 2008), existing methods for re-parameterizing selection models as pattern mixture 


models do not accommodate event time outcomes with the possibility of censoring. Further 
research is needed to develop these methods. 

The methods proposed here are tailored to available measurements that address the clinical 
questions arising from active surveillance of prostate cancer: should I have a biopsy this year; 
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what is the chance my tumor is indolent; should I undertake removal or irradiation of my prostate 
despite the known serious side effects? The model extends naturally to provide improved answers 
to these questions as additional data become available. For example, when genetic markers for 
prostate cancer risk are identified, the probability distribution for latent state (pi) could easily 
be informed by subgroups defined by their expression. Or, when MRI or ultrasound images are 
commonly used before biopsy, these data will be included in the model as well. In the case that 
some measurements are not available for all patients, the proposed framework is also able to 
adjust for informative observation of predictors and outcomes. 


The proposed model can also be modified in response to advancement in scientihc under¬ 
standing about the relationship between clinical measurements and the underlying cancer state. 
In particular, in the event of new research hndings on the rate of progression in this population, 
the model could be extended to allow an indolent cancer to transition to a lethal one, for example, 
as a Markov process. Because an individual’s true cancer state can only be observed once, the 
current data contain insufficient information to simultaneously support identihability of both 
the rate of biopsy misclassification and the rate of pathological progression in the underlying 
state. The model currently assumes that an individual’s cancer categorization (Gleason score) 
does not change over the time period under surveillance while allowing for imperfect sensitiv¬ 
ity and specificity of biopsies. This assumption reffects the current clinical understanding that 
biopsy upgrading in AS is more frequently due to misdiagnosis rather than true grade progression 


(Porten et ah, 2011). A more recent analysis by Inoue et al. (2014) suggests a rate of disease 
progression in the JH AS cohort of 12-14% within a decade of enrollment, but this estimate is 
sensitive to prior specihcation. A dynamic state extension would require strong prior knowledge 
about the progression rate parameter in order to be identihable from the current data. The 
effect of allowing for a state transition would be to give greater weight to more recent PSA and 
biopsy outcomes when predicting the underlying state rather than giving equal weight to all 
observations. 


The proposed prediction model exemplifies the statistical underpinnings of a learning health 
care system (Goolsby, Olsen, and McGinni, 2012; Smith et ah, 2013), a system with the ability 
to continuously integrate patient data and medical knowledge to optimize patient care. As 
more patients enroll in the Johns Hopkins active surveillance cohort, and as more information 
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is collected on existing patients, our ability to predict underlying health states and the likely 
trajectory of clinical outcomes will improve. Furthermore, importance sampling methods can 
be used to obtain real-time prediction updates based on the most current information in order 
to support decision-making in a clinical setting. An example interactive decision-support tool 
that provides fast predictions of a patient’s latent prostate cancer state is demonstrated at 
https: //rycoley. shinyapps . io/ dynamic-prostate-surveillance. 


Supplementary Materials 

Simulated data, JAGS scripts, and R code to reproduce the analysis and hgures are provided at 
http://github.com/rycoley/prediction-prostate-surveillance. 
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APPENDIX: 


A Bayesian hierarchical model for prediction of latent health states from 
multiple data sources with application to active surveillance of prostate cancer 

This appendix contains additional details and results related to the proposed method, its appli¬ 
cation to the Johns Hopkins Active Surveillance (JHAS) cohort, and simulations based on the 
JHAS estimates. 


A1 Methods 

Al.l Posterior Estimation 

The joint posterior distribution of the parameters, latent states, and patient-level effects is writ¬ 
ten as proportional to the likelihood given in Equation (3) and joint prior density of model 
parameters: 

/3, C, 7, (Mfc, Sfc); bj, i = 1, ..., n; r/i, f = 1,..., ns=o I 
rii,i = ns=o + 1, (Y^,^,^), (B^,;^), (R*, Vj), = 1, ©| 

oc rfi.i = l,...,ns=o I 

T]i,i = ns=o + (Y*,^,^), (B*,^), (Rj, V^), = 1, ...,n 

X7r{p,/3A,c^^^^,7,^; I ©} 

where 7r(-|©) denotes the joint prior density for model parameters with hyperparameters © 
and indexing on j and k suppressed for clarity in presentation. Patients are indexed such 
that i = 1,... ,ns=o refers to patients without surgery (S' = 0) and for whom rji is latent 
and i = ns=o J- 1,... ,n refers to patients with eventual surgery and observation of rji. Similar 
to the notation used in Equation (3), / and g are multivariate normal densities for the vector 
of log-transformed PSAs Yj and unsealed random effects bj, respectively, each with mean and 
covariance as in Section 2.2 of the accompanying paper. Xj denotes the matrix of covariate 
vectors [Xji,... ,XjMj; Zj, Uj, Vj, and Wj are similarly dehned. Bj,Rj, and Sj denote vectors 
of all biopsy, reclassihcation, and surgery observations for individual i. 
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A2 


Application to Johns Hopkins Active Surveillance Co¬ 
hort 


A2.1 The Data 


The number of observations and years of follow-up available for analysis are summarized in Table 


A1 318 patients (36%) were censored due to receiving some treatment, 130 (15%) were lost to 


follow-up, and 19 (2.2%) were censored due to death. (No patients died of prostate cancer.) Loss 
to follow-up was defined as two years without a PSA or biopsy after the most recent observation. 
407 patients (47%) remained active in the program at the time of data collection. 


As shown in the CONSORT diagram in Figure 3 of the accompanying paper, grade reclas¬ 
sification was observed in 160 patients (18% of analysis cohort). Among patients with grade 
reclassihcation, 67 patients elected surgical removal of the prostate. An additional 100 patients 
elected prostatectomy in the absence of grade reclassification. In total, 167 patients (19% of 
analysis cohort) underwent surgery, of which 161 had a definitive post-surgical Gleason score 
determination. Results of the biopsy-based estimated Gleason score and post-surgical true value 
are shown in Table IA21 


A2.2 Model Specification 
A2.2.1 PSA model 

Prostate volume is a known source of patient-level variability in PSA and, for this reason, was 
included as a predictor in the multilevel model for log-PSA. Prostate volume was measured via 
ultrasound at some biopsies. Since increases in prostate volume due to age and cancer activity 
are expected to be of a smaller magnitude than the measurement error in ultrasound-guided 
volume assessment, the average of all available prostate volume observations was used for each 
patient. 
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A2.2.2 Biopsy, reclassification, and surgery models 


The JHAS protocol is to perform a biopsy once per year. Hence, biopsy, reclassification, and 
surgery observations were categorized into annual intervals. A small number (1%) of inter¬ 
vals contained two biopsies. To accommodate this, we redefine the logistic regression model 
in Equation (1) as the probability of any biopsies during the year. Intervals with two biopsies 
then contributed two conditionally independent reclassihcation outcomes (Equation (2)) to the 
likelihood. 


For the biopsy, reclassification, and surgery models, natural spline representations of contin¬ 
uous and discrete predictors (age, time in AS, calendar date, number of previous biopsies, and 
extent of cancer found in previous biopsies) were included when doing so lowered the AIC. The 
selected degrees of freedom and location of quantile-based knots for each predictor are identihed 
in Tables IASI |A6l and 1X7 


A2.3 Simulations 


200 simulation datasets were generated using characteristics of the JHAS data and posterior 
estimates from the proposed model with biopsy and surgery lOP components. For each simulated 
dataset, the proposed model was estimated with no lOP components, biopsy only lOP, surgery 
only lOP, and both biopsy and surgery lOP components. The posterior median was recorded for 
each parameter as well as whether the 95% quantile-based posterior credible interval contained 
the true data-generating value. The mean posterior median and coverage were summarized across 
all simulated datasets. Data-generating values for model parameters are give in Tables AlO A14 
Code for simulating data is provided in the online supplement. 
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A3 Results 

A3.1 Application to Johns Hopkins Active Surveillance cohort 

A3.1.1 Posterior parameter estimates 

Posterior estimates and 95% quantile-based credible intervals for model parameters are reported 
in Tables A3j|A7 Results are given for four versions of the proposed models: no lOP components 
(“Unadjusted”), biopsy lOP component only {B lOP), surgery lOP component only (S' lOP), 
and both biopsy and surgery lOP components {B, S lOP). 


In Table |A3[ we see that the estimated marginal probability of harboring aggressive cancer 
(p) is higher in models that include a biopsy lOP component and lower in models with a surgery 
lOP component. This observation is consistent with posterior estimates of u (Table [A5| and oo 
(Table [at]) ■ Coefficient estimates in the biopsy model indicate that patients with p = 1 are less 


likely to receive an annual biopsy (last row in Table A5). Without adjusting for MNAR biopsy 


results, the modeling approach is overly optimistic about a patient’s true cancer state because 
it assumes that a patient who skips a biopsy is as likely to have a favorable biopsy as a patient 
with the same covariate data who does have a biopsy performed. Meanwhile, inclusion of the 
surgery lOP component identihes evidence that patients with p = 1 are more likely to elect 
surgical removal of the prostate, particularly if they have also experienced grade reclassihcation 


(last two rows in Table A7). Without accounting for this informative missing data mechanism. 


we run the risk of overestimating risk in this population. 


A3.1.2 Posterior estimates of rj 


Figure PA shows density plots for posterior predictions of r] for all patients in the JHAS cohort 
from four non-IOP/IOP combinations of the proposed model and a logistic regression model 


(coefficient estimates in Table A8). This diagram reinforces the model comparisons illustrated in 


the histogram and scatterplot in Figure 4 of the accompanying paper. In particular, predictions 
of the true cancer state for patients with grade reclassihcation and no surgery (top right plot) are 
considerably lower in the models with surgery lOP components. Below, in the simulation study. 
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we see a similar trend when the data is generated according to the estimated dual lOP model: if 
patients with aggressive cancer are more likely to have surgery sooner (particularly after grade 
reclassihcation), models that do not adjust for informative surgery decisions will overestimate 
patient risk. 


A3.1.3 Predictive accuracy 


We provide additional assessment of predictive accuracy of all models considered here. Measures 
of predictive accuracy of the proposed model among patients with post-surgery true state obser¬ 


vation are summarized in Table A9 (AUC and FPR estimates correspond to those presented in 
Figures 5(a) and 6(a) of the accompanying paper.) We see that the AUC, MSE, and FPR at 
TPR=0.62 are improved by using the proposed model with both biopsy and surgery lOP com¬ 
ponents. Calibration plots for the proposed model with no, only biopsy, only surgery, and both 


lOP components are given in Figures A2(a) A2(c); a calibration plot for the logistic regression 


model is given in Figure A2(d) All models appear to produce well-calibrated estimates. 


Calibration plots were also constructed for outcomes observed on all patients. Figure |A3 
presents a calibration plot for the probability of clinical outcomes (biopsy results) and choices 
(occurrence of biopsy and surgery) under the proposed model with biopsy and surgery lOP 
components. Solid lines show, for each saved iteration of the sampling chain, the htted values 
of a logistic regression of the observed outcome on the natural spline representation of each 
person-year’s posterior probability of an event. Plotting symbols at y=0 and y=l correspond to 
the observed outcome {Bij, Rij, and 5^) and are plotted on the x-axis at the mean posterior 
probability for that person-year; plotting symbol shape and color indicate eventual observations 
of the true state. Posterior probabilities and observed rates are generally similar to each other, 
with closer agreement occurring in ranges with more data. 


A3.1.4 Individual-level effect estimates in PSA model 


Posterior estimates from the multilevel model for PSA are displayed in Figure ~KA In this 
plot, each plotting circle represents the scaled patient-level intercept (x-axis) and slope (y-axis) 
estimates for a single patient. Filled circles represent patients for whom the true cancer state 
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was observed, with red indicating an aggressive cancer found after surgery and green indicating 
a determination of indolent cancer. The color of open circles reflects the posterior probability 
of aggressive cancer, ranging from 0-25% (green) to 76-100% (red), among patients for whom 
true state was not observed. Finally, credible ellipses show the posterior mean and covariance of 
patient-level coefficients in each partially latent class. We see that there is a fair amount of overlap 
in these intervals, indicating that PSA level and trajectory does not provide strong evidence of 
the true state for many patients. PSA is more informative for patients with particularly high or 
low levels and trajectories or for patients with shorter follow-up. We also note that the Hopkins 
cohort has PSA requirements for enrollment; PSA data may be more informative in a cohort 
with less strict enrollment criteria. 


A3.1.5 MCMC settings and convergence diagnostics 


For all lOP combinations, five independent posterior sampling chains were run for 50,000 iter¬ 
ations. The hrst 25,000 iterations were discarded as burn-in and posterior samples were saved 
at every twentieth iteration thereafter. Convergence of the posterior sampling algorithm was 
assessed with cumulative density and trace plots; these are given for the model with biopsy and 
surgery lOP components in Figures A5j A17 and exhibit appropriate convergence. Trace plots 
(left) show sampled values for each chain (indicated by color). Cumulative quantile plots (mid¬ 
dle) show running posterior quantiles for the median (solid line) and lower 2.5 and upper 97.5 
percentiles (dotted lines) for one sampling chain. Plots in the rightmost column show posterior 
densities for each sampling chain (indicated by color) alongside the prior probability (dotted 
lines). 


A3.1.6 Robustness of lOP estimates 

The posterior distributions of lOP coefficients, i.e., the effect of rj on biopsy occurrence and 
surgery, indicate that the data contain evidence of informative missingness, as shown in Figure 
A18| (black, solid lines in each plot). 95% quantile-based credible intervals of the log-odds ratio 
(OR) for the effect of p = 1 on the probability of having a biopsy in an interval (lefthand plot) 
and the log-OR for the interaction between r] = 1 and prior grade reclassihcation (righthand 
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plot) exclude zero (vertical line). 


An important question is whether estimation of the additional parameters in the lOP model, 
especially those associated with observation of the true cancer state, is supported by evidence 
in the data or, instead, only identihable by the likelihood construction and prior specihcation. 
To assess robustness of posterior predictions to prior specihcation, we reht the lOP model with 
multiple informative priors on both the log-OR of surgery given true state and the log-OR 
of surgery given an interaction between true state and prior biopsy results. Specihcally, we 
considered all combinations of normal priors with a variance of one and mean OR of one-half, 
one, and two for the association of rji and r]i x with the probability of surgery for patient 

i in year j (where is an indicator of grade reclassihcation for patient i during or prior 


to year j). The resulting posterior distributions, shown in Figure A18, demonstrate relative 
robustness to prior specihcation and affirm conhdence in posterior predictions from the lOP 
model with vague priors. The primary ehects of specifying these more informative priors appear 
to be a reduction in the variability of posterior distributions and an attenuation of the estimated 
ehect of the interaction oi rj = 1 and prior grade reclassihcation on the risk of surgery. Posterior 
predictions of rj and the model’s predictive accuracy were not changed by specifying informative 
priors on lOP components (not shown). It appears that repeated contributions to the likelihood 
of the probability of not having surgery {P{Sij = 0)) in intervals prior to the decision to have 
surgery provide appropriate evidence about the relationship between the true cancer state and 
its eventual observation. 


A3.2 Simulations 

A3.2.1 Posterior parameter estimates 


Posterior estimates and coverage for all models considered are given in Tables AlO A14 Estima¬ 
tion appears unbiased for the model with biopsy and surgery lOP components (which was used 
to generate data), and credible intervals from that model have nominal or slightly conservative 
coverage. Biased estimation of models without both biopsy and surgery lOP components is most 
prominent in coefficients related to the true cancer state. For example, the log odds-ratio for the 
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association between true cancer state 77 = 1 and risk of reclassification was overestimated in the 


unadjusted and biopsy lOP only models (last row, Table A13). 


A3.2.2 Posterior estimates of rj 

Density estimates for the posterior predictions of t] from a single simulated dataset are shown 
in Figure A19 These plots show similar trends in posterior predictions across model options to 


those observed in the application to JHAS cohort data (Figure Al), which indicates that the 


differences in posterior predictions across models (particularly those seen in the subgroup with 
grade reclassihcation and no surgery) would be expected if the dual biopsy and surgery lOP data 
generating mechanism was correct. 

A3.2.3 Predictive accuracy 


Table A15 gives the average AUC and MSE among patients with rj observed and unobserved 
from 200 simulation studies. We see that the AUC is highest for both groups of patients in 
the proposed model with biopsy and surgery lOP components. MSE is actually higher among 
patients without r] observed for all versions of the proposed model, likely due to a calibration 
accuracy similar to patients with rj observed and a greater sample size. MSE of predictions 
from the logistic regression model increased among patients without rj observed. This increase is 
expected since the logistic model was estimated using only data from patients with post-surgery 
r] observations, instead of the semi-supervised approach of the proposed model. 


Figure |A^ gives calibration plots for predictions from each of these models. Across all models 
the pointwise confidence intervals for calibration plots are much more narrow than those for the 
JHAS application (Figure |A2[ ) because these plots contain predictions on all patients, not just 
the smaller subset with surgery. We see in the top row of plots that predictions from the model 
with no lOP and only biopsy lOP components tend to overestimate the probability of harboring 
aggressive prostate cancer; this observation is consistent with higher estimates of p seen in both 
the application and simulation results and with our discussion above (Section 3.1.1) regarding 
the influence of surgery lOP components on risk estimates. We also see that predictions from 
the logistic regression model are poorly calibrated A21(e)[ In comparison, predictions from the 











logistic regression model in the JHAS cohort analysis (Figure A2(d)) seem well calibrated when 
only patients with rj observed are considered. 


Figure A21 shows calibration plots limited to patients with true state observations. These 
plots are comparable to those for the JHAS application (Figure 6(b) in the accompanying paper 


and Figure A2 in the appendix). We see here that the proposed model appears to underestimate 
the risk of more aggressive disease for those at lowest risk (posterior P (?7 = 1) < 0.2) when only 


patients with surgery are considered, though plots in Figure A20 showed accurate calibration. 
This suggests that patients with and without surgery are not exchangeable at given levels of 
posterior risk. Perhaps a larger number of patients receiving surgery or a stronger signal for the 
association between rj and surgery is needed to reduce this apparent bias in predictions. 


A3.3 Individualized predictions 

The goal of this modeling approach is to provide individual patients with predictions of their true 
cancer state in order to support clinical decision making. Plots in Figure A22 show posterior 
predictions of rj from the biopsy and surgery lOP model as well as predictions of future PSA 
and biopsy values for a dozen simulated patients. For each patient, plotting circles represent 
simulated PSA observations, with the scale given on the lefthand y-axis. Triangles represent 
simulated biopsies, with open triangles indicating no biopsy in an annual interval (and, thus, no 
reclassihcation observed) and hlled triangles indicating biopsy results: triangles at the bottom 
of the plot represent a Gleason score of 6 while those at the top represent a Gleason score 
of 7 or higher on biopsy. Posterior predictions of each patient’s rj value are given above the 
plot. Shaded 95% credible intervals indicate the likely PSA trajectory and risk of reclassihcation 
{P{R = l|data), scale on righthand axis) for each patient. The dark green line represents the 
predicted risk of reclassihcation on a future biopsy (age indicated on the x-axis) given data 
observed up until this time. Interpretation is similar for PSA predictions: the dark blue line 
shows the expected PSA value at a future age given currently observed data. The darkest 
shading for biopsy and PSA predictions occur at the center of the posterior distribution (47.5- 
52.5 percentile) and progressively lighter shading is used at every posterior decile (42.5-57.5, 
..., 2.5-97.5 percentiles). Note that no reclassihcation projection is given for the patient with a 
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biopsy Gleason score of 7 or higher. Since future biopsy outcomes are censored at the time of 
grade reclassihcation, post-reclassihcation biopsy predictions are not supported by this model. 
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Total observations 

Median per patient (IQR) 

PSA 

10,425 

10 (6,16) 

Biopsy 

2,741 

3 (1,4) 

Years of follow-up 

(prior to reclassification) 

4,980 

5 (3,8) 


Table Al: Summary of observations and follow-up time for n=874 patients included in JHAS 
analysis. 



Biopsy Gleason Score 

6 >7 

Total 

Indolent, 77 = 0 

Post-surgical True Value 

66 (69%) 

17 (26%) 

83 

Aggressive, 77 = 1 

30 (31%) 

48 (74%) 

78 

Total 

96 

65 

161 


Table A2: Summary of post-surgical cancer state determination ( 77 ) compared to final biopsy- 
based Gleason score (with column percentages) in JHAS analysis. 
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Table A3: JHAS results: Latent class distribution parameter p, marginal probability of more 
aggressive cancer (77 = 1 ) 


Proposed Model Variation 

Estimate (95% Cl) 

Unadjusted (no lOP components) 

0.30 (0.23, 0.38) 

Biopsy lOP component only 

0.31 (0.24, 0.39) 

Surgery lOP component only 

0.20 (0.14, 0.28) 

Biopsy and surgery lOP components 

0.23 (0.16, 0.33) 
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Table A4: JHAS results: Stratified multilevel regression for outcome PSA, Y 




Coefficient 


Estimate (95% Cl) 


Parameter 

Interpretation 

Transformation 

Unadjusted 

B lOP 

S lOP 

B, S lOP 


Mean intercept, rj = 0 


1.33 (1.27, 1.38) 

1.33 (1.27, 1.38) 

1.36 (1.30, 1.40) 

1.35 (1.30, 1.40) 


Mean intercept, rj = 1 


1.6 (1.5, 1.7) 

1.6 (1.5, 1.7) 

1.6 (1.5, 1.8) 

1.6 (1.5, 1.7) 

A* 

Mean slope (age), r/ = 0 

Mean slope (age), rj = 1 

Standardized 

mean = 67.1 

sd= 6.8 

0.24 (0.19, 0.28) 

0.50 (0.41, 0.58) 

0.24 (0.19, 0.29) 

0.48 (0.40, 0.57) 

0.26 (0.22, 0.42) 

0.53 (0.42, 0.63) 

0.26 (0.22, 0.30) 

0.50 (0.39, 0.60) 


Standard deviation, intercepts 


0.54 (0.51, 0.57) 

0.54 (0.51, 0.57) 

0.54 (0.51, 0.58) 

0.54 (0.51, 0.58) 

S 

Standard deviation, slopes 


0.39 (0.37, 0.42) 

0.40 (0.37, 0.43) 

0.40 (0.37, 0.43) 

0.40 (0.37, 0.43) 


Covariance 


0.036 (0.016, 0.057) 

0.038 (0.017, 0.059) 

0.040 (0.020, 0.061) 

0.041 (0.020, 0.063) 



Standardized 





P 

Fixed effect, prostate volume 

mean = 57.5 

0.31 (0.27, 0.35) 

0.31 (0.27, 0.35) 

0.31 (0.27, 0.35) 

0.31 (0.27, 0.35) 



sd = 24.9 






Residual variance 


0.299 (0.294, 0.303) 

0.299 (0.294, 0.303) 

0.299 (0.294, 0.303) 

0.299 (0.294, 0.303) 




Table A5: JHAS results: Logistic regression for whether biopsy was performed, -B; parameter: v 




Estimate (95% Cl) 

Covariate 

Transformation 

B lOP 

B, S lOP 

Intercept 


-2.4 (-3.2, -1.6) 

-3.2 (-2.4, -1.7) 


Natural splines, d/=4 

0.86 (0.45, 1.3) 

0.88 (0.46, 1.3) 


knots = (2, 4, 6) 

-0.41 (-1.2, 0.41) 

-0.39 (-1.2, 0.47) 

Time since diagnosis 

boundary = (1, 20) 

-1.4 (-2.7, -0.14) 

-1.4 (-2.7, -0.036) 



-7.5 (-9.9, -5.2) 

-7.4 (-9.8, -5.0) 


Natural splines, df=A 

0.74 (0.21, 1.3) 

0.74 (0.19, 1.3) 


knots = (4/4/07, 7/11/10, 1/28/13) 

1.2 (0.71, 1.6) 

1.2 (0.71, 1.6) 

Date 

boundary = (8/17/95, 9/30/15) 

2.1 (0.78, 3.4) 

2.1 (0.79, 3.4) 



-2.5 (-2.9, -2.1) 

-2.5 (-2.9, -2.1) 


Natural splines, df=2 

1.1 (0.18, 2.1) 

1.1 (0.21, 2.0) 


knots = 69.8 



Age 

boundary = (46.8, 89.5) 

-3.9 (-4.5,-3.3) 

-3.9 (-4.5,-3.3) 


Natural splines, df=2 

0.61 (-0.49, 1.6) 

0.58 (-0.63, 1.6) 


knots = 3 



# previous biopsies 

boundary = (1, 13) 

3.9 (2.8, 5.0) 

3.9 (2.7, 5.0) 

r/ = 1 


-0.47 (-0.78, -0.12) 

-0.52 (-0.88, -0.17) 
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Table A 6 ; JHAS results: Logistic regression for grade reclassification, parameter: 7 


Covariate 

Transformation 

Unadjusted 

Estimate 

B lOP 

(95% Cl) 

S lOP 

B, S lOP 

Intercept 


-3.4 (-4.8, -2.2) 

-3.4 (-4.7, -2.2) 

-2.9 (-4.2, -1.8) 

-2.9 (-4.1, -1.8) 

Time since diagnosis 

Natural splines, df=2 

knots = 2.3 

boundary = (0.08, 15.9) 

-1.3 (-2.6, -0.10) 

1.6 (-0.27, 3.4) 

-1.0 (-2.2, 0.21) 

1.7 (-0.085, 3.3) 

-1.4 (-2.5, -0.25) 

1.4 (-0.29, 3.0) 

-1.3 (-2.4, -0.17) 

1.5 (-0.30, 3.0) 

Date 

Natural splines, df=2 

knots = 1/7/09 

boundary = (10/25/95, 6/19/14) 

0.007 (-2.4, 2.6) 

1.1 (0.44, 1.9) 

-0.046 (-2.3, 2.5) 

1.1 (0.43, 1.8) 

-0.037 (-2.2, 2.3) 

1.1 (0.40, 1.7) 

-0.10 (-2.2, 2.3) 

1.0 (0.40, 1.7) 

Age 

Standardized 

mean = 67.7 

sd = 5.5 

0.61 (0.38, 0.86) 

0.61 (0.39, 0.86) 

0.56 (0.36, 0.78) 

0.55 (0.35, 0.77) 

r) = 1 


2.1 (1.5, 2.7) 

2.1 (1.5, 2.7) 

1.8 (1.1, 2.5) 

1.6 (0.92, 2.3) 




Table A7: JHAS results: Logistic regression for whether surgery was performed, S'; parameter: 




Covariate 

Transformation 

Estimate 

S lOP 

(95% Cl) 

B, S lOP 

Intercept 


-6.4 (-8.9, -4.2) 

-6.2 (-8.6, -4.0) 


Natural splines, c?/=3 

1.8 (0.90, 2.8) 

1.7 (0.82, 2.7) 

Time since diagnosis 

knots = (2, 4, 6) 

1.3 (-0.84, 3.3) 

1.2 (-0.93, 3.3) 


boundary = (1, 20) 

6.7 (3.9, 9.5) 

6.7 (3.8, 9.4) 



2.9 (-2.0, 6.8) 

2.9 (-2.0, 6.8) 


Natural splines, c?/=3 

0.72 (-0.17, 1.7) 

0.67 (-0.22, 1.6) 

Date 

knots = (6/18/08, 4/15/12) 

-2.1 (-5.2, 1.3) 

-2.1 (-5.3, 1.3) 


boundary = (8/17/95, 9/30/15) 

-0.91 (-1.9, -0.003) 

-0.93 (-1.9, -0.020) 

Age 

Natural splines, df=2 

knots = 69.8 

boundary = (46.8, 89.6) 

-4.9 (-7.3, -2.2) 

-11 (-14, -7.8) 

-5.0 (-7.4, -2.3) 

-11 (-14, -7.7) 


Standardized 



^ previous biopsies 

mean = 3.8 

-0.45 (-0.87, -0.015) 

-0.40 (-0.85, 0.037) 


sd = 2.3 




Standardized 



max. previous ^ positive cores 

mean = 1.6 

0.36 (0.22, 0.51) 

0.36 (0.22, 0.50) 


sd = 0.9 



max. previous max % positive 

Natural splines, df=2 

knots = 15 

boundary = (1, 100) 

3.7 (2.5, 4.9) 

1.1 (0.051, 2.1) 

3.7 (2.5, 4.9) 

1.1 (0.015, 2.1) 

previous R = 1 

16 

1.3 (0.57, 2.0) 

1.2 (0.50, 1.9) 

rj = 1 


0.91 (0.13, 1.6) 

0.59 (-0.29, 1.4) 

previous R = \ x rj = 1 


2.0 (0.70, 3.2) 

2.3 (1.1, 3.6) 



Table A8: JHAS results: Estimated odds ratios for aggressive prostate cancer from logistic 
regression analysis of JHAS cohort patients with post-surgery observations of rj 


Covariate 

Odds Ratio (95% Cl) 

Reclassihcation on biopsy 

4.7 (2.0, 11) 

Age (years) 

1.1 (0.98, 1.2) 

^ previous biopsies without reclassihcation 

0.68 (0.41, 0.1.2) 

Years in AS 

1.5 (1.0, 2.2) 

PSA density (xlO) 

1.7 (0.61, 4.9) 

Slope log-PSA (xlO) 

1.3 (0.98, 1.6) 


Table A9: JHAS results: Predictive accuracy among patients with post-surgery rj observations. 
95% quantile-based bootstrapped intervals are given in parentheses. 


Estimation Method 

AUC 

MSE 

FPR at TPR=0.62 

Biopsy and Surgery lOP 

Proposed 

0.75 (0.67, 0.83) 

0.201 (0.17, 0.24) 

0.14 (0.07, 0.30) 

Biopsy only lOP 

0.74 (0.66, 0.82) 

0.205 (0.17, 0.24) 

0.17 (0.09, 0.26) 

Surgery only lOP 

Model 

0.74 (0.65, 0.81) 

0.207 (0.17, 0.24) 

0.19 (0.08, 0.32) 

No lOP components 

0.72 (0.64, 0.80) 

0.210 (0.18, 0.25) 

0.19 (0.11, 0.29) 

Logistic Regression 

0.74 (0.66, 0.81) 

0.209 (0.18, 0.24) 

0.19 (0.10, 0.29) 

Grade Reclassihcation on Final Biopsy 

n/a 

0.292 (0.22, 0.37) 

0.20 (0.12, 0.29) 


17 





Table AlO: Simulation results: Latent class distribution parameter p, marginal probability of 
more aggressive cancer (77 = 1). Data generating value was p = 0.23. 


Proposed Model Variation 

Estimate 

Coverage 

Unadjusted (no lOP components) 

0.34 

8.5% 

Biopsy lOP component only 

0.34 

4.5% 

Surgery lOP component only 

0.24 

94% 

Biopsy and surgery lOP components 

0.25 

96% 
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Table All: Simulation results: Stratified multilevel regression for outcome PSA, Y 




Generating 

Coefficient 

Estimate (Coverage of 95% Interval) 

Parameter 

Interpretation 

Value 

Transformation 

Unadjusted 

B lOP 

S lOP 

B, S lOP 


Mean intercept, rj — Q 

1.4 


1.3 (95%) 

1.3 (94%) 

1.3 (97%) 

1.3 (96%) 


Mean intercept, j; = 1 

1.6 


1.5 (71%) 

1.6 (74%) 

1.6 (96%) 

1.6 (97%) 

M 

Mean slope (age) , rj = 0 

0.26 

Standardized 

0.25 (95%) 

0.25 (94%) 

0.26 (97%) 

0.25 (97%) 




mean = 67.1 






Mean slope (age) , rj = 1 

0.50 

00 

CD 

II 

0.44 (65%) 

0.44 (66%) 

0.49 (93%) 

0.48 (92%) 


Standard deviation, intercepts 

0.54 


0.55 (93%) 

0.55 (92%) 

0.54 (93%) 

0.54 (92%) 

S 

Standard deviation, slopes 

0.40 


0.40 (95%) 

0.40 (94%) 

0.40 (95%) 

0.40 (94%) 


Covariance 

0.041 


0.042 (94%) 

0.041 (95%) 

0.040 (96%) 

0.040 (94%) 




Standardized 





d 

Fixed effect, prostate volume 

0.31 

mean = 57.5 

0.31 (95%) 

0.31 (95%) 

0.31 (95%) 

0.31 (95%) 




sd = 24.9 






Residual variance 

0.30 


0.30 (95%) 

0.30 (96%) 

0.30 (95%) 

0.30 (95%) 




Table A12: Simulation results: Logistic regression for whether biopsy was performed, B; param¬ 
eter: u 



Generating 


Estimate (95% Cl) 

Covariate 

Value 

Transformation 

B lOP 

B, S lOP 

Intercept 

-2.4 


-2.4 (91%) 

-2.4 (96%) 


0.88 

Natural splines, df=A 

0.91 (99%) 

0.94 (97%) 


-0.39 

knots = (2, 4, 6) 

-0.42 (96%) 

0.36 (96%) 

Time since diagnosis 

-1.4 

boundary = (1, 20) 

-1.1 (96%) 

-1.1 (96%) 


-7.4 


-6.9 (98%) 

-6.9 (98%) 


0.74 

Natural splines, df=4 

0.72 (91%) 

0.73 (96%) 


1.2 

knots = (4/4/07, 7/11/10, 1/28/13) 

1.2 (66%) 

1.2 (97%) 

Date 

2.1 

boundary = (8/17/95, 9/30/15) 

2.0 (91%) 

2.1 (96%) 


-2.5 


-2.5 (99%) 

-2.5 (97%) 



Natural splines, df=2 




1.1 

knots = 69.8 

1.1 (96%) 

1.1 (96%) 

Age 

-3.9 

boundary = (46.8, 89.5) 

-4.0 (96%) 

-4.0 (96%) 



Natural splines, df=2 




0.58 

knots = 3 

0.55 (98%) 

0.44 (98%) 

^ previous biopsies 

3.9 

boundary = (1, 13) 

3.8 (91%) 

3.8 (96%) 

rj = 1 

-0.52 


-0.40 (66%) 

-0.53 (97%) 
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Table A13: Simulation results: Logistic regression for grade reclassification, i?; parameter: 7 


Covariate 

Generating 

Value 

Transformation 

Unadjusted 

Estimate 

B lOP 

(95% Cl) 

S lOP 

B, S lOP 

Intercept 

-2.9 


-3.6 (90%) 

-3.5 (91%) 

-3.0 (95%) 

-2.9 (96%) 

Time since diagnosis 

-1.3 

1.4 

Natural splines, df=2 

knots = 2.3 

boundary = (0.08, 15.9) 

-1.3 (98%) 

1.4 (95%) 

-1.2 (99%) 

1.6 (96%) 

-1.3 (97%) 

1.2 (96%) 

-1.4 (97%) 

1.3 (96%) 

Date 

-0.07 

1.0 

Natural splines, df=2 

knots = 1/7/09 

boundary = (10/25/95, 6/19/14) 

0.018 (96%) 

1.1 (98%) 

-0.031 (96%) 

1.1 (98%) 

0.098 (95%) 

1.1 (97%) 

0.065 (96%) 

1.1 (98%) 

Age 

0.55 

Standardized 

mean = 67.7 

sd = 5.5 

0.61 (91%) 

0.61 (91%) 

0.56 (96%) 

0.56 (96%) 

V = 1 

1.6 


2.2 (67%) 

2.1 (66%) 

1.6 (92%) 

1.5 (97%) 




Table A14: Simulation results: Logistic regression for whether surgery was performed, S'; pa¬ 
rameter; OJ 


Covariate 

Generating 

Value 

Transformation 

Estimate 

S lOP 

(95% Cl) 

B, S lOP 

Intercept 

-5.0 


-5.5 (95%) 

-5.3 (96%) 


1.8 

Natural splines, df=3 

1.9 (97%) 

1.8 (97%) 

Time since diagnosis 

1.2 

knots = (2, 4, 6) 

1.6 (96%) 

1.4 (96%) 


6.7 

boundary = (1, 20) 

6.0 (95%) 

5.8 (96%) 


2.8 


0.57 (97%) 

0.49 (98%) 


0.67 

Natural splines, df=3 

0.81 (96%) 

0.80 (96%) 

Date 

-2.1 

knots = (6/18/08, 4/15/12) 

-1.8 (92%) 

-1.7 (97%) 


-0.93 

boundary = (8/17/95, 9/30/15) 

-0.99 (95%) 

-0.95 (96%) 

Age 

-5.0 

-11 

Natural splines, df=2 

knots = 69.8 

boundary = (46.8, 89.6) 

-4.9 (97%) 

-11 (96%) 

-4.9 (97%) 

-11 (96%) 



Standardized 



# previous biopsies 

-0.40 

mean = 3.8 

-0.46 (95%) 

-0.39 (96%) 



sd = 2.3 



previous R = 1 

1.2 


1.3 (97%) 

1.2 (98%) 

T] = 1 

0.59 


0.64 (96%) 

0.46(96%) 

previous R = 1 x rj = 1 

2.3 


2.2 (92%) 

2.5 (97%) 
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Table A15: Simulation accuracy: Predictive accuracy in simulation studies with data generated 
according to biopsy and surgery TOP model. The mean AUC and MSE across 200 simulations 
are given for patients with and without rj observed. 95% quantile-based intervals of estimated 
AUC and MSE from all sims are in parentheses. 


Estimation Method 

7] observed 

7] unobserved 

AUC (95% Int) MSE (95% Int) 

AUC (95% Int) MSE (95% Int) 

Biopsy and Surgery lOP 

0.83 (0.77, 0.88) 0.17 (0.14, 0.19) 

0.77 (0.72, 0.81) 0.12 (0.11, 0.16) 

Biopsy lOP only 

0.81 (0.76, 0.86) 0.18 (0.15, 0.20) 

0.74 (0.69, 0.78) 0.16 (0.14, 0.18) 

Surgery lOP only 

0.81 (0.76, 0.86) 0.17 (0.15, 0.20) 

0.73 (0.67, 0.78) 0.13 (0.11, 0.18) 

Unadjusted (No lOP) 

0.80 (0.74, 0.85) 0.18 (0.16, 0.20) 

0.71 (0.66, 0.75) 0.17 (0.14, 0.19) 

Logistic Regression 

0.77 (0.70, 0.83) 0.19 (0.16, 0.21) 

0.68 (0.63, 0.73) 0.26 (0.18, 0.56) 
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No Surgery, Grade Reclassification, n=95 


No Surgery, No Grade Reclassification, n=618 




Surgery, No Grade Reclassification, n=96 Surgery, Grade Reclassification, n=65 



■ - No lOP 
• • • Biopsy lOP 


— - Surgery lOP — Logistic 

- Biopsy, Surgery iOP 


Figure Al: Density plots of posterior predictions of rj, stratified by biopsy results and the decision 
to have surgery, from the JHAS analysis. Line types correspond to different statistical models, 
as indicated by the legend at the bottom. The vertical dotted line represents the proportion of 
surgery patients with no grade reclassihcation (left) and reclassihcation (right) on biopsy who 
had higher grade prostate cancer on the full prostate examination. 
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(a) Proposed model, unadjusted 


(b) Proposed model, biopsy lOP 



0.0 0.2 0.4 0.6 0.8 1.0 

Posterior P(Aggressive PCa) 



0.0 0.2 0.4 0.6 0.8 1.0 

Estimated P(Aggressive PCa) 


(c) Proposed model, surgery lOP 


(d) Logistic regression 


Figure A2: Calibration plots for predictions of true cancer state in JHAS analysis. (Calibration 
plot for predictions from model with both biopsy and surgery lOP components given in Figure 
6 (b) of accompanying paper.) 
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Biopsy Performed 



Grade Reclassification 


Surgical Removal of Prostate 


0.2 0.4 0.6 0.8 


0.0 0.1 0.2 0.3 0.4 0.5 

Posterior Probability of Outcome 



Observed Cancer State 

^^Aggressive (Gleason>6) □ lndolent(Gleason=6) -|- Unobserved/No Surgery 


Figure A3: Calibration plots for predictions of the occurrence of a biopsy (left), grade reclassi¬ 
fication on biopsy (center), and surgery (right) at annual intervals for all patients in the JHAS 
cohort. Each solid line represents agreement between the posterior probability and observed 
event rate for a single iteration of the sampling algorithm for the proposed model with surgery 
and biopsy lOP components. 
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Figure A4: Patient-level intercept (x-axis) and slope (y-axis) estimates from the multilevel PSA 
model in the proposed model with biopsy and surgery lOP components. 
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Figure A5: Plots to assess convergence for p, the marginal probability of p = 1, for the proposed 
model with biopsy and surgery lOP components applied to JHAS data: trace plots for hve 
sampling chains, indicated by color (left); cumulative quantile plot for a representative sampling 
chain (center); and (right) plot comparing prior (dotted line) to posterior density (solid lines). 
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Figure A6: Plots to assess convergence for /3, the population-level coefficient for prostate volume 
in the multilevel regression model for PSA. Plots are from the joint posterior of the proposed 
model with biopsy and surgery lOP components applied to JHAS data: trace plots for hve 
sampling chains, indicated by color (left); cumulative quantile plot for a representative sampling 
chain (center); and (right) plot comparing prior (dotted line) to posterior density (solid lines). 
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Figure A7: Plots to assess convergence for fi, the mean intercept and slope for latent classes in 
the multilevel regression model for PSA. Plots are from the joint posterior of the proposed model 
with biopsy and surgery lOP components applied to JHAS data: trace plots for hve sampling 
chains, indicated by color (left); cumulative quantile plot for a representative sampling chain 
(center); and (right) plot comparing prior (dot^fjd line) to posterior density (solid lines). 
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Figure A8: Plots to assess convergence for the variance of patient-level intercepts and slopes, the 
covariance between patient-level intercepts and slopes, and the residual variance for the multilevel 
PSA regression model. Plots are from the joint posterior of the proposed model with biopsy and 
surgery lOP components applied to JHAS data: trace plots for hve sampling chains, indicated 
by color (left); cumulative quantile plot for a rqjjgresentative sampling chain (center); and (right) 
plot comparing prior (dotted line) to posterior density (solid lines). 
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Figure A9: Plots to assess convergence for coefficients in the logistic regression for patient 
decision to receive an annual biopsy; associated covariates are indicated on the far left y-axis. 
Plots are from the joint posterior of the proposed model with biopsy and surgery lOP components 
applied to JHAS data: trace plots for hve sampling chains, indicated by color (left); cumulative 
quantile plot for a representative sampling c^§in (center); and (right) plot comparing prior 
(dotted line) to posterior density (solid lines). 
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Figure AlO: Plots to assess convergence for coefficients in the logistic regression for patient 
decision to receive an annual biopsy; associated covariates are indicated on the far left y-axis. 
Plots are from the joint posterior of the proposed model with biopsy and surgery lOP components 
applied to JHAS data: trace plots for hve sampling chains, indicated by color (left); cumulative 
quantile plot for a representative sampling c^gin (center); and (right) plot comparing prior 
(dotted line) to posterior density (solid lines). 
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Figure All: Plots to assess convergence for coefficients in the logistic regression for patient 
decision to receive an annual biopsy; associated covariates are indicated on the far left y-axis. 
Plots are from the joint posterior of the proposed model with biopsy and surgery lOP components 
applied to JHAS data: trace plots for hve sampling chains, indicated by color (left); cumulative 
quantile plot for a representative sampling c^gin (center); and (right) plot comparing prior 
(dotted line) to posterior density (solid lines). 
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Figure A12: Plots to assess convergence for 7 , coefficients in the logistic regression for grade 
reclassification on biopsy; associated covariates are indicated on the far left y-axis. Plots are 
from the joint posterior of the proposed model with biopsy and surgery lOP components applied 
to JHAS data: trace plots for hve sampling chains, indicated by color (left); cumulative quantile 
plot for a representative sampling chain (center); and (right) plot comparing prior (dotted line) 
to posterior density (solid lines). 
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Figure A13: Plots to assess convergence for 7 , coefficients in the logistic regression for grade 
reclassification on biopsy; associated covariates are indicated on the far left y-axis. Plots are 
from the joint posterior of the proposed model with biopsy and surgery lOP components applied 
to JHAS data: trace plots for five sampling chains, indicated by color (left); cumulative quantile 
plot for a representative sampling chain (centeg)^ and (right) plot comparing prior (dotted line) 
to posterior density (solid lines). 
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Figure A14: Plots to assess convergence for n;, coefficients in the logistic regression for patient 
decision to undergo surgical removal of the prostate; associated covariates are indicated on the 
far left y-axis. Plots are from the joint posterior of the proposed model with biopsy and surgery 
lOP components applied to JHAS data: trace plots for hve sampling chains, indicated by color 
(left); cumulative quantile plot for a representative sampling chain (center); and (right) plot 
comparing prior (dotted line) to posterior density (solid lines). 
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Figure A15: Plots to assess convergence for n;, coefficients in the logistic regression for patient 
decision to undergo surgical removal of the prostate; associated covariates are indicated on the 
far left y-axis. Plots are from the joint posterior of the proposed model with biopsy and surgery 
lOP components applied to JHAS data: trace plots for hve sampling chains, indicated by color 
(left); cumulative quantile plot for a represeng^ive sampling chain (center); and (right) plot 
comparing prior (dotted line) to posterior density (solid lines). 
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Figure A16: Plots to assess convergence for n;, coefficients in the logistic regression for patient 
decision to undergo surgical removal of the prostate; associated covariates are indicated on the 
far left y-axis. Plots are from the joint posterior of the proposed model with biopsy and surgery 
lOP components applied to JHAS data: trace plots for hve sampling chains, indicated by color 
(left); cumulative quantile plot for a represenggtive sampling chain (center); and (right) plot 
comparing prior (dotted line) to posterior density (solid lines). 
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Figure A17: Plots to assess convergence for n;, coefficients in the logistic regression for patient 
decision to undergo surgical removal of the prostate; associated covariates are indicated on the 
far left y-axis. Plots are from the joint posterior of the proposed model with biopsy and surgery 
lOP components applied to JHAS data: trace plots for hve sampling chains, indicated by color 
(left); cumulative quantile plot for a representative sampling chain (center); and (right) plot 
comparing prior (dotted line) to posterior density (solid lines). 
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Figure A18: Posterior distributions for biopsy and surgery lOP coefficients under vague and 
informative priors. Vertical line drawn at log-OR=0. 
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Figure A19: Density plots of posterior predictions of rj, stratified by biopsy results and the 
decision to have surgery, from a single simulated dataset. Line types correspond to different 
statistical models, as indicated by the legend at the bottom. The vertical dotted line represents 
the proportion of surgery patients with no grade reclassification (left) and reclassification (right) 
on biopsy who had higher grade prostate cancer on the full prostate examination. 
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Figure A20: Calibration plots among all patients for predictions of true cancer state in one 
simulated dataset 
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Figure A21: Calibration plots among patients with true state observations (p known) for 
predictions of true cancer state in one simulated dataset 
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Figure A22: Simulated PSA (circles) and reclassification (triangle) data for a dozen patients. 

Vertical position of filled triangles indicate results of biopsies received- triangles at the bottom 

represent Gleason 6 observations while those on top represent Gleason 7 or above; open triangles 

indicate missed biopsies. Posterior probabilities of having aggressive prostate cancer (PGa) are 

shown above each patient’s data. Shaded intervals show pointwise posterior credible intervals 

around projected PSA and reclassification trajectories with shading gradations indicating deciles 

of the interval and darkest shading occurring at the posterior median. 
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